Announcements¶

Quiz

  • Quiz 1 scores are available on Gradescope.
  • If you received a score of 0-2: you must sign up for Monday's make-up test on the Piazza poll.
  • If you received a score of 3: submit a written explanation of how to improve any questions that you missed or where we left a comment for improvement on Gradescope by Monday.

Assignments

  • Reading Assessment 3 is due today. It covers NBFMG Sections 2.3-2.5 and Chapter 3.
  • Programming Assignment 3 is due on Thursday 2/15.
  • Reading Assessment 4 and Programming Assignment 4 will be released today, and due on Thursday 2/15.

Quiz discussion¶

Question. What issues did you find in these lines of code?

1:  def Alice_sign(message):
2:    secret_key = randint(0, 2**64)                  # sample uniformly at random
3:    public_key = PublicKey.from_sk(secret_key)      # compute Alice's public key
4:    random_k = int(time.time())                     # sample one-time randomness
5:    signature = sign(secret_key, message, random_k) # run ECDSA.sign
6:    return public_key, message, signature           # send to Bob over the Internet

7:  def Bob_verify(public_key, message, signature):   # received from Alice over the Internet
8:    return verify(public_key, message, signature)   # run ECDSA.verify, return the result

Lecture 7: Bitcoin's Security Guarantees¶

Learning Outcomes¶

By the end of today's lecture, you will learn:

  • How nodes and miners can check the validity of the blockchain.
  • How light clients can partially check the validity of the blockchain, and where they need assistance from the nodes and miners.
  • How we can formally specify the "agreement" property of blockchains as two concrete guarantees: liveness and safety.

Recap from last time¶

Bitcoin was invented in 2009 by an anonymous party (or parties) holding the pseudonym Satoshi Nakamato.

At the heart of Bitcoin is a blockchain, which is an append-only ledger that is stored by anyone on the Internet that uses the Bitcoin software.

19th century ledger

Source: Wikipedia

The most important property of the Bitcoin blockchain is that all participants eventually reach agreement over the state of the ledger. If I believe that Alice owns 3 coins, then (eventually) so should you.

In order to promote long-term agreement and stability, Bitcoin purposely sacrifices short-term speed and efficiency.

There are three types of participants within Bitcoin.

  • A light client only needs to hold one or more secret keys for a digital signature scheme (plus a small amount of data that we will discuss today).

  • A node additionally stores the entire state of the blockchain.

  • A miner additionally tries to write new blocks on the blockchain. For their service, miners receive 6.25 new Bitcoins plus the fees inside of every transaction within their block.

Cryptocurrencies like Bitcoin have been carefully designed to ensure that censorship, inconsistency, and double spending attacks are difficult to perform and not cost-effective.

  • Bitcoin has a randomized election system (i.e., a lottery) to choose the next miner. In this way, Bitcoin is censorship resistant because every miner (with a reasonable fraction of the computing power) will eventually get a chance to mine a block.
  • Bitcoin achieves consistency through its rule that miners and nodes should always choose the largest blockchain (or more precisely, the one with the most cumulative difficulty), with the tiebreaker being to choose the chain you heard first.

    This rule incentivizes miners to post their newly minted blocks as soon as possible, and combined with the randomized election system ensures that inconsistent forks in the blockchain do not last very long.

[Image source: Mango Research]

  • Finally, Bitcoin protects against double-spending attacks through its proof of work puzzles that are computationally burdensome to solve. Everyone tries to find a nonce such that the hash of the block begins with a large number of 0s. The puzzle friendliness property of SHA-256 says that each miner has no better strategy than to guess the nonce at random. Eventually someone will find and post a winning nonce in a block.

[Image source: blockchain.info]

7.1 Blockchain Validity¶

Remember that a blockchain contains individual transactions, which are placed into blocks, which are themselves linked together into a blockchain.

Nodes and miners in the system maintain a local copy of the blockchain, and they should check the validity of each new block when it is created and gossiped around the network.

Overall, here is the high-level idea of Bitcoin's consensus algorithm (adapted from Abhishek Jain's slides):

  1. New transactions are broadcast to all nodes that happen to be online right now
  2. Each miner collects new transactions into a block
  3. After solving a proof of work puzzle, a random miner gets to broadcast its block
  4. Other nodes accept the block only if all transactions in it are valid (unspent, valid signatures)
  5. Miners express their acceptance of the block by including its hash in the next block they create

In more detail, for an individual transaction to be valid, it must:

  1. Contain a valid signature (i.e., one that passes the digital signature scheme's Verify function) from everyone who is sending money.
  2. Not send out more money than the senders have.
  3. Point to a previous transaction where: (a) the sender(s) actually received the money that they are now trying to spend, and (b) this transaction has not yet been spent.

This is why Bitcoin is said to follow an unspent transaction (UTXO) model.

If I show a single transaction to a light client, then they can check properties 1 and 2 on their own, but they cannot check property 3!

The third property requires knowledge of the previous transactions in the blockchain, so only nodes and miners can verify this.

For a block to be valid, it must:

  1. Only contain transactions that are valid, as described above.
  2. Contain a winning nonce that satisfies the difficulty rule (i.e., where the hash of the entire block begins with some large number of 0s).
  3. Link to blocks that are themselves valid, all the way back to the first (or "genesis") block in the blockchain.

(There are some other constraints in the Bitcoin software beyond the ones we've described here, but these are the most important ones that we will focus on.)

Once again: if I hand an individual block to a light client, they can check property 2 on their own, but not property 3.

Only a full nodes and miners can check the validity of all transactions and blocks, since they have stored a local copy of the entire blockchain.

We'll come back to the light clients later today. First though, let's summarize what we have learned about the security guarantees of a blockchain.

7.2 Consensus via an honest majority assumption¶

Bitcoin relies on the following assumption:

The majority of the computational power used for mining (at any given time) is controlled by honest miners.

Here, the term "honest" means someone who runs the unmodified Bitcoin software, plays by the rules, and doesn't try to double-spend or otherwise cheat the system.

If this assumption holds, then a leader election based on proof of work (eventually) stops double-spending attacks. That's because a dishonest miner cannot keep up with the raw computing power of the honest majority.

On the other hand: if an attacker actually has the majority of the hash power, then they can solve proof of work puzzles faster than the honest miners and can therefore rewrite history!

Note that this assumption is different than the "anytrust" assumption, which was all that we needed to ensure censorship resistance.

To see why Bitcoin needs an honest majority of computing power, let's use the same forking picture as before, but change the scenario as follows:

  • The time in the present is actually after block D has been posted.
  • A dishonest miner wants to revoke a transaction that occurred in the past, in block B.

[Image source: Mango Research]

To do this, the miner will need to post many proofs of work in a row, very quickly. It needs to create $3 + N$ blocks faster than the real world produces $N$ blocks.

This is a Markov chain problem!

Nakamoto's whitepaper includes the following function (which I translated from C++ into Python). It calculates the probability that an attacker who controls a q fraction of the computing power of the network can execute a double-spend attack to overwrite the last z blocks in the blockchain.

In [1]:
# Code adapted from the Bitcoin whitepaper at https://bitcoin.org/bitcoin.pdf

from math import exp, pow

def AttackerSuccessProbability(q: float, z: int):
    p = 1.0 - q
    poisson_lambda = z * (q / p)
    sum = 1.0
    for k in range(z+1):
        poisson = exp(-poisson_lambda)
        for i in range(1, k+1):
            poisson = poisson * poisson_lambda / i
        sum = sum - poisson * (1 - pow(q / p, z - k))
    return sum

q = 0.1
print("q =", q)
for z in range(10):
    print("z =", z, "\t  Prob =", format(AttackerSuccessProbability(q, z),"1.7f"))
print("")
q = 0.3
print("q =", q)
for z in range(0, 55, 5):
    print("z =", z, "\t  Prob =", format(AttackerSuccessProbability(q, z),"1.7f"))
q = 0.1
z = 0 	  Prob = 1.0000000
z = 1 	  Prob = 0.2045873
z = 2 	  Prob = 0.0509779
z = 3 	  Prob = 0.0131722
z = 4 	  Prob = 0.0034552
z = 5 	  Prob = 0.0009137
z = 6 	  Prob = 0.0002428
z = 7 	  Prob = 0.0000647
z = 8 	  Prob = 0.0000173
z = 9 	  Prob = 0.0000046

q = 0.3
z = 0 	  Prob = 1.0000000
z = 5 	  Prob = 0.1773523
z = 10 	  Prob = 0.0416605
z = 15 	  Prob = 0.0101008
z = 20 	  Prob = 0.0024804
z = 25 	  Prob = 0.0006132
z = 30 	  Prob = 0.0001522
z = 35 	  Prob = 0.0000379
z = 40 	  Prob = 0.0000095
z = 45 	  Prob = 0.0000024
z = 50 	  Prob = 0.0000006

This code shows that even if a dishonest miner controls 10% of all the hash computing power on the blockchain, it will only have about a 1.3% chance of succeeding at double-spending a transaction from z = 3 blocks ago.

The attacker can increase its chance of success by buying or renting more computational power.

In response, defenders can resist this attack by waiting for more blocks to be confirmed in the public, honest chain.

Suppose there is a transaction that pays you money, but you don't trust the sender and you want to be confident that the money is "safe" and cannot be taken away from you through a double spend attack. You want to be 99.9% confident that your money is safe (i.e., that a double-spend succeeds with at most 0.1% probability).

The Nakamoto whitepaper includes the following table that shows how many blocks you need to see mined after your block, as a function of the dishonest miner's computational power.

Adversary's computational power # blocks required
10% 5
15% 8
20% 11
25% 15
30% 24
35% 41
40% 89
45% 340

There are two important takeaways from this table:

  1. The hash power required to execute a double-spend attack quickly becomes onerous enough to be not worth it, for all but the most expensive transactions. Even if the attacker controls even 10% of the Bitcoin network's computing power, executing a double spend attack will likely take more than one hour. By contrast, if the miner uses this hash power honestly to create new blocks, it could have earned (in expectation) tens of thousands of dollars in mining fees.

  2. On the other hand, if the attacker controls the majority of the hash power, then your money is never safe; even transactions made years ago could eventually be double-spent. This is called a 51% attack.

7.3 Formally defining Bitcoin's guarantees¶

Suppose you are Bob, and you see that Alice is sending a transaction on the Bitcoin peer-to-peer network to pay you $1.

  • You shouldn't believe yet that the money is yours, because the transaction isn't in a block yet.
  • Even after you see the transaction in a block B, you still shouldn't believe that the money is yours because a fork could happen and this block might not wind up on the longest chain.

[Image source: Mango Research]

  • Even after you see a couple of blocks built on top of the block B containing your transaction, you still might not believe that the money is yours because Alice might be trying to run a double-spend attack on the forked chain.

[Image source: Mango Research]

  • But surely at some point, you get to believe that the money is yours... right?

Definition. A node considers a block to be finalized if it is at least $k$ blocks deep inside of the longest chain. We'll declare a transaction to be finalized if it is part of a finalized block. (The choice of $k$ here affects the probability of an erroneous decision.)

Overall, Bitcoin's proof-of-work blockchain achieves the following security goals as long as a majority of the computing power is honest (i.e., follows the rules).

  1. Liveness: Every transaction is eventually finalized by all honest nodes.
  2. Safety: Honest nodes do not finalize different blocks at the same height.

As a simple heuristic, many people consider Bitcoin transactions to be finalized once they are 6 blocks deep in the blockchain. At that point, an attacker with 10% of the mining power has only probability $p = 0.0002428$ of rewriting history.

Note that we have not yet provided any security guarantees to the light clients. This is intentional, since light clients do not have the ability to verify transactions and blocks

In order to achieve these goals, Bitcoin is purposely designed to be:

  • Slow, in that transactions take awhile to become finalized.
  • Bounded, in that only a small number of transactions can be inserted in each block.
  • Computationally intensive, in order to distinguish the honest majority from the dishonest minority.

In Part 3 of this course, we will explore alternative ways to satisfy the liveness and safety goals with different tradeoffs.

7.4 Simplified Payment Verification¶

Let's revisit those light clients, and see how we can extend liveness and safety to them.

Remember that a light client doesn't have the hard drive space to store the 500+ gigabytes of data in the Bitcoin blockchain. Still though, they can periodically connect to the Internet and download some data.

It would be great if we could compress all transactions into a small dataset. Unfortunately, that's impossible.

Instead, we can leverage the facts that:

  1. The _nodes know the full list of transactions, and
  2. The nodes have good Internet connectivity, so they are available to provide transaction data to the light clients.

The clients just need to verify the accuracy of the transactions -- that is, to make sure that the nodes are telling them the truth. To do that, we can use a Merkle tree.

Merkle trees¶

Remember what Merkle trees do:

  • We can create a small representation of an entire set of data $S = (f_1, f_2, \ldots, f_k)$.
  • Later, we can open any element of the set and prove set membership.

In more detail: we construct a Merkle tree as follows.

  • Take the hash of each individual file $f_i$.
  • Then, take the hash of each pair of hashes.
  • Only store the hash at the root of the tree.

Generating a Merkle proof

Source: Alin Tomescu, Decentralized Thoughts blog

If you store one 32-byte hash digest corresponding to the set of files $f_1, f_2, \ldots, f_k$, then someone else can prove to you that one file (say, $f_3$) is contained within the set $S$ represented by the Merkle tree.

Verifying a Merkle proof

Source: Alin Tomescu, Decentralized Thoughts blog

Application to SPV¶

How does this relate to Bitcoin? Well, if you think about the transactions in blocks as "files," we can do something similar here.

To support light clients, we are going to make two changes to the way that Bitcoin works. The first change applies to everyone, and the second change applies only to light clients.

First, each block will contain the Merkle root of all transactions within the block. The block itself will also contain all transactions, as before.

We will call the block header as just the following items: the nonce of the current block, the root of a Merkle tree for all transactions within that block, and the pointer to the previous block's hash.

Because the Merkle root binds together all of the transactions, it now suffices to create a proof-of-work puzzle where the hash is only taken over the header, and not the transactions themselves.

[Image source: Nakamoto whitepaper]

A light client will follow the same rule as nodes: find the blockchain of the largest height (or more accurately, the most cumulative difficulty). But the light client only downloads the block headers, not the full blocks (i.e., they won't know the transactions).

This is substantially smaller:

  • Each header is 80 bytes in size, so downloading all block headers for Bitcoin is about 66 MB of data right now.
  • This is much smaller than 500+ GB for the full blocks!

With this info, a light client can:

  • Verify the proof of work puzzles themselves.
  • Query a full node to ask whether they have received money. That is: if the client believes it has received money in block $N$, it can ask a node to send (1) the transaction that pays its public key and (2) a Merkle proof that it is contained within the set represented by that block's Merkle root.

Note that a light client cannot verify:

  1. That the full block is valid (in particular, that the transactions have valid signatures)
  2. That their money hasn't yet been spent in a subsequent block.

They must rely on the nodes to perform verification check #1, and on their own memory for property #2.

This makes light clients more vulnerable to attackers than nodes. For instance, a malicious miner can just create a single "bad" block that contains invalid signatures; nodes will reject it but a light client will be fooled.

For this reason, light clients should be even more careful when finalizing a transaction.

Eventually though, light clients will inherit the liveness and safety guarantees from the nodes.